Exploring EVENTS

Screen%20Shot%202022-01-26%20at%2012.38.44.png

Experiments

    1. Visualising Events Dataframe
    1. Exploring Tags Events
    1. Calculating Events Description Similarity
    1. Calculating Events Description Topic Modelling
    1. Exploring the Schedules of Events
      • 5.1 Getting the Frequency of Starting Dates of Events Schedules
      • 5.2 Getting the Frequency of End Dates of Events Schedules
    1. Exploring the Performances Tickets of Events Schedules
      • 6.1 Getting the Frequency of Price Tickets
      • 6.2 Getting the frequency of type (Standard, Children) tickets
      • 6.3 Exploring Performances Places -
        • 6.3.1 Frequency of Performances per town
        • 6.3.2 Frequency of Type tickets per town
        • 6.3.3 Frequency of Price tickets type per town
        • 6.3.4 Frequency of Max_Price tickets per town
          • 6.3.4.1 Frequency of Free tickets per town
          • 6.3.4.2 Frequency of No Free tickets per town
      • 6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews
        • 6.4.1 Frequency of Price Tickets per Scottish City
        • 6.4.2 Frequency of Type Tickets per Scottish City
        • 6.4.3 Frequency of Schedules Dates per Event and per Scottish City
        • 6.4.4.Grouping Schedules per Event and Scottish City
        • 6.4.5 Exploring Tags per Schedule and Scottish Cities
          • 6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh
          • 6.4.5.2 Exploring the Frequency of schedules tags for Glasgow
        • 6.4.6 Histograms of starting/end schedules dates for Edinburgh
        • 6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time
          • 6.4.7.1 Frequency of schedules Starting Date in Scottish City
          • 6.4.7.2 Frequency of schedules Ending Date in Scottish City
          • 6.4.7.3 Scheduled tags and Starting Dates in Scottish City
          • 6.4.7.4 Scheduled tags and Starting Dates in Scottish City

0. Importing libraries and loading the json file with 5000 events to a dataframe

In [1]:
import json
import pandas as pd
import plotly.express as px
import os
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import pickle
import plotly.graph_objects as go
import numpy as np
from gensim.parsing.preprocessing import remove_stopwords
import re
In [2]:
with open('events.json', 'r') as f:
    data = json.load(f)
df = pd.DataFrame(data)

1. Visualizing the events dataframe

In [3]:
df
Out[3]:
event_id list_id status created_ts modified_ts name schedules sort_name tags descriptions images website properties links
0 477e919c-161f-9a69-5334-20060018d1a0 1626528 live 2021-01-16T01:36:53 2021-02-10T09:35:38 The Art Post Gallery Monthly Launch [{'start_ts': '2022-02-05T19:00:00+00:00', 'en... Art Post Gallery Monthly Launch [visual art] [{'type': 'default', 'description': 'Every mon... [{'url': 'https://files.list.co.uk/images/2021... NaN NaN NaN
1 8654b892-a4f5-d79d-881e-43060018e8ed 1632493 live 2021-02-23T11:05:44 2021-02-23T17:43:01 Chameleons & venue change [{'start_ts': '2022-02-05T19:00:00+00:00', 'en... Chameleons & venue change [music] [{'type': 'third-party', 'description': 'Anyon... [{'url': 'https://files.list.co.uk/images/2021... http://chameleonsv.com/ NaN NaN
2 1a6c0f5c-197d-5907-7e60-bac500134434 1262644 live 2019-04-08T08:31:35 2021-02-25T11:24:59 Curious About Cambridge [{'start_ts': '2022-01-20T10:30:00+00:00', 'en... Curious About Cambridge [activities, days out, history, kids, traditio... [{'type': 'default', 'description': 'Self-guid... [{'url': 'https://files.list.co.uk/images/2019... http://www.curiousabout.co.uk {'phone.info': '01159502151'} NaN
3 53580f5c-197d-5907-cb4f-25c50012519f 1200543 live 2019-01-31T13:14:36 2021-02-25T11:25:51 Curious About Glasgow [{'start_ts': '2022-01-20T10:00:00+00:00', 'en... Curious About Glasgow [activities, days out, history, kids, traditio... [{'type': 'default', 'description': 'Explore G... [{'url': 'https://files.list.co.uk/images/2019... http://www.curiousabout.co.uk NaN NaN
4 88280f5c-197d-5907-7f38-15c500122da4 1191332 live 2022-01-19T20:17:53.839456 2021-02-25T11:27:08 Curious About Lichfield [{'start_ts': '2022-01-20T10:00:00+00:00', 'en... Curious About Lichfield [activities, days out, traditional & heritage,... [{'type': 'default', 'description': 'Discover ... [{'url': 'https://files.list.co.uk/images/2019... NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4995 d08faadd-4f1e-b60e-2eef-acc40001954c 103756 live 2004-06-30T14:28:42 2022-01-23T12:04:50 Dr Feelgood [{'start_ts': '2022-01-28T21:00:00+00:00', 'en... Dr Feelgood GENERIC [blues, dr feelgood, music, rock & pop] [{'type': 'default', 'description': 'Rhythm'n'... [{'url': 'https://files.list.co.uk/images/d/dr... http://www.drfeelgood.org NaN NaN
4996 80849d5c-197d-5907-f1d9-27b5000ef931 981297 live 2022-01-19T20:17:37.935042 2022-01-23T12:04:51 Flo & Joan [{'start_ts': '2022-02-02T19:30:00+00:00', 'en... Flo & Joan [comedy] [{'type': 'default', 'description': 'Musical c... [{'url': 'https://files.list.co.uk/images/2018... NaN NaN [{'url': 'http://twitter.com/FloandJoan', 'tit...
4997 8654b8c6-f5f5-d79d-e644-de16001b622e 1794606 live 2022-01-23T12:05:02 2022-01-23T12:05:02 Bookbug in Peebles [{'start_ts': '2022-01-24T00:00:00+00:00', 'en... Bookbug in Peebles [kids, activities] [{'type': 'third-party', 'description': '" We... NaN NaN NaN NaN
4998 5f02b19c-161f-9a69-7732-ea16001b0884 1771652 live 2022-01-19T20:20:48.306734 2022-01-23T12:06:35 Tasty Treats Afternoon Tea [{'start_ts': '2022-01-24T14:00:00+00:00', 'en... Tasty Treats Afternoon Tea [afternoon tea, days out, food & drink] [{'type': 'third-party', 'description': 'Treat... NaN NaN NaN NaN
4999 8654b8c6-77f5-d79d-fc44-de16001b6233 1794611 live 2022-01-23T12:06:39 2022-01-23T12:06:39 Take 5 [{'start_ts': '2022-01-24T10:00:00+00:00', 'en... Take 5 [exhibition] [{'type': 'third-party', 'description': '“Take... NaN NaN NaN NaN

5000 rows × 14 columns

In [4]:
## selecting some columns

Experiment 2: Exploring Tags Events

We are going to separete the elements stored in each tag list into new rows.

In [5]:
df["tags"][0:5]
Out[5]:
0                                         [visual art]
1                                              [music]
2    [activities, days out, history, kids, traditio...
3    [activities, days out, history, kids, traditio...
4    [activities, days out, traditional & heritage,...
Name: tags, dtype: object
In [6]:
df_tags=df.explode('tags')
In [7]:
df_tags
Out[7]:
event_id list_id status created_ts modified_ts name schedules sort_name tags descriptions images website properties links
0 477e919c-161f-9a69-5334-20060018d1a0 1626528 live 2021-01-16T01:36:53 2021-02-10T09:35:38 The Art Post Gallery Monthly Launch [{'start_ts': '2022-02-05T19:00:00+00:00', 'en... Art Post Gallery Monthly Launch visual art [{'type': 'default', 'description': 'Every mon... [{'url': 'https://files.list.co.uk/images/2021... NaN NaN NaN
1 8654b892-a4f5-d79d-881e-43060018e8ed 1632493 live 2021-02-23T11:05:44 2021-02-23T17:43:01 Chameleons & venue change [{'start_ts': '2022-02-05T19:00:00+00:00', 'en... Chameleons & venue change music [{'type': 'third-party', 'description': 'Anyon... [{'url': 'https://files.list.co.uk/images/2021... http://chameleonsv.com/ NaN NaN
2 1a6c0f5c-197d-5907-7e60-bac500134434 1262644 live 2019-04-08T08:31:35 2021-02-25T11:24:59 Curious About Cambridge [{'start_ts': '2022-01-20T10:30:00+00:00', 'en... Curious About Cambridge activities [{'type': 'default', 'description': 'Self-guid... [{'url': 'https://files.list.co.uk/images/2019... http://www.curiousabout.co.uk {'phone.info': '01159502151'} NaN
2 1a6c0f5c-197d-5907-7e60-bac500134434 1262644 live 2019-04-08T08:31:35 2021-02-25T11:24:59 Curious About Cambridge [{'start_ts': '2022-01-20T10:30:00+00:00', 'en... Curious About Cambridge days out [{'type': 'default', 'description': 'Self-guid... [{'url': 'https://files.list.co.uk/images/2019... http://www.curiousabout.co.uk {'phone.info': '01159502151'} NaN
2 1a6c0f5c-197d-5907-7e60-bac500134434 1262644 live 2019-04-08T08:31:35 2021-02-25T11:24:59 Curious About Cambridge [{'start_ts': '2022-01-20T10:30:00+00:00', 'en... Curious About Cambridge history [{'type': 'default', 'description': 'Self-guid... [{'url': 'https://files.list.co.uk/images/2019... http://www.curiousabout.co.uk {'phone.info': '01159502151'} NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4997 8654b8c6-f5f5-d79d-e644-de16001b622e 1794606 live 2022-01-23T12:05:02 2022-01-23T12:05:02 Bookbug in Peebles [{'start_ts': '2022-01-24T00:00:00+00:00', 'en... Bookbug in Peebles activities [{'type': 'third-party', 'description': '" We... NaN NaN NaN NaN
4998 5f02b19c-161f-9a69-7732-ea16001b0884 1771652 live 2022-01-19T20:20:48.306734 2022-01-23T12:06:35 Tasty Treats Afternoon Tea [{'start_ts': '2022-01-24T14:00:00+00:00', 'en... Tasty Treats Afternoon Tea afternoon tea [{'type': 'third-party', 'description': 'Treat... NaN NaN NaN NaN
4998 5f02b19c-161f-9a69-7732-ea16001b0884 1771652 live 2022-01-19T20:20:48.306734 2022-01-23T12:06:35 Tasty Treats Afternoon Tea [{'start_ts': '2022-01-24T14:00:00+00:00', 'en... Tasty Treats Afternoon Tea days out [{'type': 'third-party', 'description': 'Treat... NaN NaN NaN NaN
4998 5f02b19c-161f-9a69-7732-ea16001b0884 1771652 live 2022-01-19T20:20:48.306734 2022-01-23T12:06:35 Tasty Treats Afternoon Tea [{'start_ts': '2022-01-24T14:00:00+00:00', 'en... Tasty Treats Afternoon Tea food & drink [{'type': 'third-party', 'description': 'Treat... NaN NaN NaN NaN
4999 8654b8c6-77f5-d79d-fc44-de16001b6233 1794611 live 2022-01-23T12:06:39 2022-01-23T12:06:39 Take 5 [{'start_ts': '2022-01-24T10:00:00+00:00', 'en... Take 5 exhibition [{'type': 'third-party', 'description': '“Take... NaN NaN NaN NaN

14079 rows × 14 columns

In [8]:
g_tags=df_tags.groupby(['tags']).size().reset_index()
g_tags=g_tags.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
g_tags
Out[8]:
tags number_of_times
477 music 2102
715 theatre 656
218 days out 562
174 comedy 544
168 clubs 497
... ... ...
375 illuminated trails 1
368 horticulture 1
364 historical 1
358 health 1
795 youth 1

796 rows × 2 columns

In [9]:
fig = px.line(g_tags, x="tags", y="number_of_times", title='Number of times that each tag appears')
fig.show()

Experiment 3: Description Similarity

Exploding the column description

Given a description cell, with a list of descriptions, we will create new row per element in that list.

In [10]:
df["descriptions"][0:5]
Out[10]:
0    [{'type': 'default', 'description': 'Every mon...
1    [{'type': 'third-party', 'description': 'Anyon...
2    [{'type': 'default', 'description': 'Self-guid...
3    [{'type': 'default', 'description': 'Explore G...
4    [{'type': 'default', 'description': 'Discover ...
Name: descriptions, dtype: object
In [11]:
df_descriptions=df.explode('descriptions')
In [12]:
df_d=pd.concat([df_descriptions.drop(['descriptions'], axis=1), df_descriptions['descriptions'].apply(pd.Series)], axis=1)
In [13]:
df_desc=df_d[["event_id", "description"]]
In [14]:
df_desc
Out[14]:
event_id description
0 477e919c-161f-9a69-5334-20060018d1a0 Every month, the Art Post Galleries showcase w...
1 8654b892-a4f5-d79d-881e-43060018e8ed Anyone who witnessed their awe-inspiring gigs ...
2 1a6c0f5c-197d-5907-7e60-bac500134434 Self-guided heritage tours around Cambridge.
2 1a6c0f5c-197d-5907-7e60-bac500134434 Explore, Discover & Enjoy Cambridge with two s...
3 53580f5c-197d-5907-cb4f-25c50012519f Explore Glasgow with these two self-guided wal...
... ... ...
4996 80849d5c-197d-5907-f1d9-27b5000ef931 Multi-award winning musical comedy duo (and si...
4997 8654b8c6-f5f5-d79d-e644-de16001b622e "\n\nWe are so excited to bring back face to f...
4998 5f02b19c-161f-9a69-7732-ea16001b0884 Treat yourself to one of our Afternoon Teas fo...
4998 5f02b19c-161f-9a69-7732-ea16001b0884 Treat yourself to one of our Afternoon Teas fo...
4999 8654b8c6-77f5-d79d-fc44-de16001b6233 “Take 5” is an exhibition of work by five Argy...

5582 rows × 2 columns

Finding similar descriptions events - Deep Learning - Transformers

In [15]:
# remving the rows which description is empty
df_desc1=df_desc.dropna(subset=['description']).reset_index()
In [16]:
df_desc1[0:5]
Out[16]:
index event_id description
0 0 477e919c-161f-9a69-5334-20060018d1a0 Every month, the Art Post Galleries showcase w...
1 1 8654b892-a4f5-d79d-881e-43060018e8ed Anyone who witnessed their awe-inspiring gigs ...
2 2 1a6c0f5c-197d-5907-7e60-bac500134434 Self-guided heritage tours around Cambridge.
3 2 1a6c0f5c-197d-5907-7e60-bac500134434 Explore, Discover & Enjoy Cambridge with two s...
4 3 53580f5c-197d-5907-cb4f-25c50012519f Explore Glasgow with these two self-guided wal...
In [17]:
# total number of rows with descriptions
df_desc1.shape[0]
Out[17]:
5366
In [18]:
#selecting the description colum
documents=df_desc1["description"].values
In [19]:
#d=documents[0:100]
#d=documents[:]
In [20]:
def clean_documents(text):
    text = re.sub(r'\S*@\S*\s?', '', text, flags=re.MULTILINE) # remove email
    text = re.sub(r'http\S+', '', text, flags=re.MULTILINE) # remove web addresses
    text = re.sub("\'", "", text) # remove single quotes
    text = remove_stopwords(text)
    return text

We are going to save clean documents in d

In [21]:
d=[]
for text in documents:
    d.append(clean_documents(text))
     
In [22]:
# Using all-MiniLM-L6-v2 Transformer
model = SentenceTransformer('all-MiniLM-L6-v2')
In [23]:
#Training our text_embeddings - using the descriptions available & all-MiniLM-L6-v2 Transformer
text_embeddings = model.encode(d, batch_size = 8, show_progress_bar = True)

In [24]:
np.shape(text_embeddings)
Out[24]:
(5366, 384)
In [25]:
### A small example how to get an embedding vector from a description
In [26]:
first_description=df_desc1["description"].iloc[0]
first_description
first_description_embedding= model.encode(first_description, batch_size = 8, show_progress_bar = True)

Finding the similarity between documents

In [27]:
similarity_def=cosine_similarity(
    [first_description_embedding],
    text_embeddings)
In [28]:
similarities = cosine_similarity(text_embeddings)
print('pairwise dense output:\n {}\n'.format(similarities))
pairwise dense output:
 [[0.9999999  0.13797444 0.29161933 ... 0.10864848 0.10864848 0.39017975]
 [0.13797444 1.0000004  0.31839126 ... 0.03004319 0.03004319 0.26403916]
 [0.29161933 0.31839126 0.9999999  ... 0.1136207  0.1136207  0.30936757]
 ...
 [0.10864848 0.03004319 0.1136207  ... 1.         1.         0.24985477]
 [0.10864848 0.03004319 0.1136207  ... 1.         1.         0.24985477]
 [0.39017975 0.26403916 0.30936757 ... 0.24985477 0.24985477 1.        ]]

In [29]:
similarities_sorted = similarities.argsort()
similarities_sorted
Out[29]:
array([[4226, 3946, 1383, ..., 2699, 1575,    0],
       [4226, 4109, 4816, ..., 3422, 2476,    1],
       [ 692, 2202, 3640, ...,    8,    3,    2],
       ...,
       [4774, 4527, 1144, ...,  669, 5364, 5363],
       [4774, 4527, 1144, ...,  669, 5364, 5363],
       [4109, 5244, 5225, ..., 1737, 1826, 5365]])
In [30]:
id_1 = []
id_2 = []
score = []
for index,array in enumerate(similarities_sorted):
    p=len(array)
    id_1.append(index)
    id_2.append(array[-2])
    score.append(similarities[index][array[-2]])
index_df = pd.DataFrame({'id_1' : id_1,
                          'id_2' : id_2,
                          'score' : score})
print(p)
5366
In [31]:
index_df
Out[31]:
id_1 id_2 score
0 0 1575 0.669278
1 1 2476 0.487221
2 2 3 0.666455
3 3 4085 0.771264
4 4 2418 0.781163
... ... ... ...
5361 5361 5360 0.736429
5362 5362 2151 0.626621
5363 5363 5364 1.000000
5364 5364 5364 1.000000
5365 5365 1826 0.534832

5366 rows × 3 columns

Exploring documents 3 and 4085 - similarity score: 0.77

In [32]:
documents[3]
Out[32]:
'Explore, Discover & Enjoy Cambridge with two self-guided, quirky, heritage walks with an optional treasure hunt. \nAre you curious about Cambridge? Looking for an unusual and quirky activity which gets you out in the fresh air whatever the weather? Take one of our self-guided walks with a treasure hunt theme – looping around the better-known sights, as well as some of the more unusual and quirky ones, which combined make Cambridge a fabulous place to explore!\nYou’ll get everything – detailed directions, maps, clues (with answers in the back!), and interesting snippets about the history of Cambridge and the people that have shaped it. \nBuy in booklet or instant download format and explore in your own time. One booklet is enough for four people of all ages to enjoy.\nEnter code List20 at checkout for a 20% discount on any two or more purchases from Curious About.'
In [34]:
documents[4085]
Out[34]:
'**Have fun discovering Manchester with two self-guided, quirky, heritage walks with an optional treasure hunt. Buy in booklet or instant download format.** \n\nAre you curious about Manchester? Looking for an unusual and quirky activity which gets you out in the fresh air whatever the weather? Take one of our self-guided walks with a treasure hunt theme – looping around the better-known sights, as well as some of the more unusual and quirky ones, which combined make Manchester a fabulous place to explore! \n\nYou’ll get everything – detailed directions, maps, clues (with answers in the back!), and interesting snippets about the history of Manchester and the people that have shaped it. \n\nBuy in booklet or instant download format (to use on your mobile device or to print at home) and explore in your own time. One booklet is enough for four people of all ages to enjoy. \n\nEnter code **Fantastic20** at checkout for a 20% discount on any two or more purchases from Curious About.'

Finding the first 10 similar definitions given the document 3

In [35]:
## Lets take the document 3
doc_index =3
documents[3]
Out[35]:
'Explore, Discover & Enjoy Cambridge with two self-guided, quirky, heritage walks with an optional treasure hunt. \nAre you curious about Cambridge? Looking for an unusual and quirky activity which gets you out in the fresh air whatever the weather? Take one of our self-guided walks with a treasure hunt theme – looping around the better-known sights, as well as some of the more unusual and quirky ones, which combined make Cambridge a fabulous place to explore!\nYou’ll get everything – detailed directions, maps, clues (with answers in the back!), and interesting snippets about the history of Cambridge and the people that have shaped it. \nBuy in booklet or instant download format and explore in your own time. One booklet is enough for four people of all ages to enjoy.\nEnter code List20 at checkout for a 20% discount on any two or more purchases from Curious About.'
In [36]:
results={}
for i in range(-2, -12, -1):
    similar_index=similarities_sorted[doc_index][i]
    rank=similarities[doc_index][similar_index]
    results[similar_index]=[rank]
In [37]:
results
Out[37]:
{4085: [0.7712636],
 9: [0.7706559],
 2877: [0.7666387],
 4086: [0.7481786],
 4397: [0.7426897],
 2876: [0.7414552],
 8: [0.7288109],
 3906: [0.72502995],
 4399: [0.7207868],
 2100: [0.7016957]}

Experiment 4: Description Topic Modelling - Deep Learning - BERTopic

Lets find the topic modelling of our descriptions We are going to use the text_embeddings calculated in the previous phase.

In [38]:
len(documents)
Out[38]:
5366
In [39]:
topic_model = BERTopic(min_topic_size=20).fit(d, text_embeddings)
In [40]:
topics, probs = topic_model.transform(d, text_embeddings)

Visualizing our topics

In [41]:
topic_model.visualize_topics()
In [42]:
#### Visualzing the first 5 keywords of our first 5 topics
In [43]:
topic_model.visualize_barchart()

Visualizing the similarity between topics

In [44]:
topic_model.visualize_heatmap()

Getting the frequency of each topic.

We should always ignore the first -1 topic.

In [45]:
#Lets see the frequency of the first 10 topics
topic_model.get_topic_freq()[0:10]
Out[45]:
Topic Count
0 -1 2033
1 0 460
2 1 452
3 2 306
4 3 244
5 4 166
6 5 151
7 6 141
8 7 138
9 8 123
In [46]:
print("Number of topics found %s" %len(topic_model.get_topic_freq()))
Number of topics found 31

Visualizing the keywords of our topics.

In [47]:
#topic_model.get_topics()
In [48]:
document_0_topic=topics[0]
print("The topic of the document 0 is %s " %document_0_topic)
The topic of the document 0 is 4 
In [49]:
topic_model.get_topic(0)
Out[49]:
[('night', 0.025253118081039222),
 ('party', 0.02469490482411674),
 ('dj', 0.021365412790797948),
 ('club', 0.01935505170803623),
 ('event', 0.018401500482683356),
 ('disco', 0.017069040645670133),
 ('djs', 0.016692359578034638),
 ('music', 0.016653253987767267),
 ('house', 0.015443542234091875),
 ('best', 0.013654573264250496)]
In [50]:
df_desc1["description"].iloc[0]
Out[50]:
'Every month, the Art Post Galleries showcase works of art by local artists via post. This event is for the official unveiling of the art in each post.'

Experiment 5: Exploring the Schedules of Events

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the schedules column

In [51]:
df["schedules"]
Out[51]:
0       [{'start_ts': '2022-02-05T19:00:00+00:00', 'en...
1       [{'start_ts': '2022-02-05T19:00:00+00:00', 'en...
2       [{'start_ts': '2022-01-20T10:30:00+00:00', 'en...
3       [{'start_ts': '2022-01-20T10:00:00+00:00', 'en...
4       [{'start_ts': '2022-01-20T10:00:00+00:00', 'en...
                              ...                        
4995    [{'start_ts': '2022-01-28T21:00:00+00:00', 'en...
4996    [{'start_ts': '2022-02-02T19:30:00+00:00', 'en...
4997    [{'start_ts': '2022-01-24T00:00:00+00:00', 'en...
4998    [{'start_ts': '2022-01-24T14:00:00+00:00', 'en...
4999    [{'start_ts': '2022-01-24T10:00:00+00:00', 'en...
Name: schedules, Length: 5000, dtype: object
In [52]:
df_schedules=df
df_schedules.rename(columns={'tags':'event_tags'}, inplace=True)
df_schedules.rename(columns={'name':'event_name'}, inplace=True)
df_schedules.rename(columns={'links':'event_links'}, inplace=True)
df_schedules=df.explode('schedules')
#df_schedules
df_s=pd.concat([df_schedules.drop(['schedules'], axis=1), df_schedules['schedules'].apply(pd.Series)], axis=1)
In [53]:
df_s.iloc[0]
Out[53]:
event_id                          477e919c-161f-9a69-5334-20060018d1a0
list_id                                                        1626528
status                                                            live
created_ts                                         2021-01-16T01:36:53
modified_ts                                        2021-02-10T09:35:38
event_name                         The Art Post Gallery Monthly Launch
sort_name                              Art Post Gallery Monthly Launch
event_tags                                                [visual art]
descriptions         [{'type': 'default', 'description': 'Every mon...
images               [{'url': 'https://files.list.co.uk/images/2021...
website                                                            NaN
properties                                                         NaN
event_links                                                        NaN
start_ts                                     2022-02-05T19:00:00+00:00
end_ts                                       2022-06-04T19:00:00+01:00
place_id                          3854b834-46f5-d79d-fbdc-87e500000065
tags                                                                []
ticket_summary                                                    free
place                {'place_id': '3854b834-46f5-d79d-fbdc-87e50000...
performances         [{'ts': '2022-02-05T19:00:00+00:00', 'tickets'...
performance_space                                                  NaN
Name: 0, dtype: object

Getting the Frequency of Starting Dates of Events Schedules

In [54]:
df_start=df_s.groupby([pd.to_datetime(df_s['start_ts'])]).size().reset_index()
df_start=df_start.rename(columns={0: "number_of_times"})
df_start=df_start.sort_values(by=['number_of_times'], ascending=False)
df_start.reset_index()
Out[54]:
index start_ts number_of_times
0 608 2022-01-27 19:00:00+00:00 426
1 294 2022-01-23 12:30:00+00:00 204
2 678 2022-01-28 19:30:00+00:00 190
3 945 2022-02-04 19:00:00+00:00 165
4 948 2022-02-04 19:30:00+00:00 165
... ... ... ...
1023 629 2022-01-28 11:10:00+00:00 1
1024 627 2022-01-28 10:50:00+00:00 1
1025 626 2022-01-28 10:45:00+00:00 1
1026 109 2022-01-22 10:25:00+00:00 1
1027 0 2022-01-19 20:30:00+00:00 1

1028 rows × 3 columns

This means that we have 426 events' schedules starting at 2022-01-27 19:00:00+00:00

In [55]:
#### Visualizing the previous Start_Ts Schedules Events Freq.
In [56]:
fig = px.histogram(df_start, x='start_ts', y="number_of_times", title="Frequency of Starts Dates Schedules")
fig.show()

Getting the Frequency of End Dates of Events Schedules

In [57]:
df_end=df_s.groupby([pd.to_datetime(df_s['end_ts'])]).size().reset_index()
df_end=df_end.rename(columns={0: "number_of_times"})
df_end=df_end.sort_values(by=['number_of_times'], ascending=False)
df_end.reset_index()
fig = px.histogram(df_end, x='end_ts', y="number_of_times", title="Frequency of End Dates Schedules")
fig.show()

Experiment 6: Exploring the Performances Tickets of Events Schedules

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the performance column. We can not explode the performance column, if we hadnt have exploded the schedules column before. For that reason, we are using df_s dataframe, which has already exploded the schedules column.

In [58]:
df_s
Out[58]:
event_id list_id status created_ts modified_ts event_name sort_name event_tags descriptions images ... properties event_links start_ts end_ts place_id tags ticket_summary place performances performance_space
0 477e919c-161f-9a69-5334-20060018d1a0 1626528 live 2021-01-16T01:36:53 2021-02-10T09:35:38 The Art Post Gallery Monthly Launch Art Post Gallery Monthly Launch [visual art] [{'type': 'default', 'description': 'Every mon... [{'url': 'https://files.list.co.uk/images/2021... ... NaN NaN 2022-02-05T19:00:00+00:00 2022-06-04T19:00:00+01:00 3854b834-46f5-d79d-fbdc-87e500000065 [] free {'place_id': '3854b834-46f5-d79d-fbdc-87e50000... [{'ts': '2022-02-05T19:00:00+00:00', 'tickets'... NaN
1 8654b892-a4f5-d79d-881e-43060018e8ed 1632493 live 2021-02-23T11:05:44 2021-02-23T17:43:01 Chameleons & venue change Chameleons & venue change [music] [{'type': 'third-party', 'description': 'Anyon... [{'url': 'https://files.list.co.uk/images/2021... ... NaN NaN 2022-02-05T19:00:00+00:00 2022-02-05T19:00:00+00:00 43468f54-e1f1-605a-3a6f-23250000d70b [] NaN {'place_id': '43468f54-e1f1-605a-3a6f-23250000... [{'ts': '2022-02-05T19:00:00+00:00', 'tickets'... NaN
2 1a6c0f5c-197d-5907-7e60-bac500134434 1262644 live 2019-04-08T08:31:35 2021-02-25T11:24:59 Curious About Cambridge Curious About Cambridge [activities, days out, history, kids, traditio... [{'type': 'default', 'description': 'Self-guid... [{'url': 'https://files.list.co.uk/images/2019... ... {'phone.info': '01159502151'} NaN 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00 1754b890-c3f5-d79d-43e2-209500015ebc [] £7.49-7.99 {'place_id': '1754b890-c3f5-d79d-43e2-20950001... [{'ts': '2022-01-24T10:30:00+00:00', 'duration... NaN
3 53580f5c-197d-5907-cb4f-25c50012519f 1200543 live 2019-01-31T13:14:36 2021-02-25T11:25:51 Curious About Glasgow Curious About Glasgow [activities, days out, history, kids, traditio... [{'type': 'default', 'description': 'Explore G... [{'url': 'https://files.list.co.uk/images/2019... ... NaN NaN 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00 1ed0eadd-4f1e-b60e-36e0-bcc400002980 [] £7.49-7.99 {'place_id': '1ed0eadd-4f1e-b60e-36e0-bcc40000... [{'ts': '2022-01-24T10:00:00+00:00', 'duration... NaN
4 88280f5c-197d-5907-7f38-15c500122da4 1191332 live 2022-01-19T20:17:53.839456 2021-02-25T11:27:08 Curious About Lichfield Curious About Lichfield [activities, days out, traditional & heritage,... [{'type': 'default', 'description': 'Discover ... [{'url': 'https://files.list.co.uk/images/2019... ... NaN NaN 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00 8654b832-6f42-bb9a-a37f-6765000125f0 [] £7.49-7.99 {'place_id': '8654b832-6f42-bb9a-a37f-67650001... [{'ts': '2022-01-24T10:00:00+00:00', 'duration... NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4998 5f02b19c-161f-9a69-7732-ea16001b0884 1771652 live 2022-01-19T20:20:48.306734 2022-01-23T12:06:35 Tasty Treats Afternoon Tea Tasty Treats Afternoon Tea [afternoon tea, days out, food & drink] [{'type': 'third-party', 'description': 'Treat... NaN ... NaN NaN 2022-01-24T12:00:00+00:00 2022-01-31T12:00:00+00:00 e654b810-def5-d79d-d3bc-86f50001f4dc [] £10.00 {'place_id': 'e654b810-def5-d79d-d3bc-86f50001... [{'ts': '2022-01-24T12:00:00+00:00', 'duration... NaN
4998 5f02b19c-161f-9a69-7732-ea16001b0884 1771652 live 2022-01-19T20:20:48.306734 2022-01-23T12:06:35 Tasty Treats Afternoon Tea Tasty Treats Afternoon Tea [afternoon tea, days out, food & drink] [{'type': 'third-party', 'description': 'Treat... NaN ... NaN NaN 2022-01-24T12:00:00+00:00 2022-01-31T12:00:00+00:00 f164b830-f0f5-d79d-4bf7-8ea500019b50 [] £10.00 {'place_id': 'f164b830-f0f5-d79d-4bf7-8ea50001... [{'ts': '2022-01-24T12:00:00+00:00', 'duration... NaN
4998 5f02b19c-161f-9a69-7732-ea16001b0884 1771652 live 2022-01-19T20:20:48.306734 2022-01-23T12:06:35 Tasty Treats Afternoon Tea Tasty Treats Afternoon Tea [afternoon tea, days out, food & drink] [{'type': 'third-party', 'description': 'Treat... NaN ... NaN NaN 2022-01-24T12:00:00+00:00 2022-01-31T12:00:00+00:00 f654b810-def5-d79d-600d-86f50001f4de [] £10.00 {'place_id': 'f654b810-def5-d79d-600d-86f50001... [{'ts': '2022-01-24T12:00:00+00:00', 'duration... NaN
4998 5f02b19c-161f-9a69-7732-ea16001b0884 1771652 live 2022-01-19T20:20:48.306734 2022-01-23T12:06:35 Tasty Treats Afternoon Tea Tasty Treats Afternoon Tea [afternoon tea, days out, food & drink] [{'type': 'third-party', 'description': 'Treat... NaN ... NaN NaN 2022-01-24T13:00:00+00:00 2022-01-31T15:00:00+00:00 fd54b840-4af5-d79d-2c77-46f50001f4b2 [] £10.00 {'place_id': 'fd54b840-4af5-d79d-2c77-46f50001... [{'ts': '2022-01-24T13:00:00+00:00', 'tickets'... NaN
4999 8654b8c6-77f5-d79d-fc44-de16001b6233 1794611 live 2022-01-23T12:06:39 2022-01-23T12:06:39 Take 5 Take 5 [exhibition] [{'type': 'third-party', 'description': '“Take... NaN ... NaN NaN 2022-01-24T10:00:00+00:00 2022-02-13T10:00:00+00:00 72d2eadd-4f1e-b60e-f7e0-bcc400006931 [] free {'place_id': '72d2eadd-4f1e-b60e-f7e0-bcc40000... [{'ts': '2022-01-24T10:00:00+00:00', 'duration... NaN

12710 rows × 21 columns

In [61]:
df_p= df_s.explode("performances")
df_p=pd.concat([df_p.drop(['performances'], axis=1), df_p['performances'].apply(pd.Series)], axis=1)
In [62]:
df_p[0:2]
Out[62]:
event_id list_id status created_ts modified_ts event_name sort_name event_tags descriptions images ... tags ticket_summary place performance_space ts tickets links properties duration time_unknown
0 477e919c-161f-9a69-5334-20060018d1a0 1626528 live 2021-01-16T01:36:53 2021-02-10T09:35:38 The Art Post Gallery Monthly Launch Art Post Gallery Monthly Launch [visual art] [{'type': 'default', 'description': 'Every mon... [{'url': 'https://files.list.co.uk/images/2021... ... [] free {'place_id': '3854b834-46f5-d79d-fbdc-87e50000... NaN 2022-02-05T19:00:00+00:00 [{'type': 'Standard', 'currency': 'GBP', 'min_... [{'url': 'http://theartpost.co.uk', 'type': 'b... NaN NaN NaN
1 8654b892-a4f5-d79d-881e-43060018e8ed 1632493 live 2021-02-23T11:05:44 2021-02-23T17:43:01 Chameleons & venue change Chameleons & venue change [music] [{'type': 'third-party', 'description': 'Anyon... [{'url': 'https://files.list.co.uk/images/2021... ... [] NaN {'place_id': '43468f54-e1f1-605a-3a6f-23250000... NaN 2022-02-05T19:00:00+00:00 [{'type': 'Standard', 'currency': 'GBP', 'desc... [{'url': 'https://www.axs.com/uk/events/391221... {} NaN NaN

2 rows × 26 columns

Exploring tickets

Now we have to explode the tickets column. We are going to remove the rows which tickets information is empty.

In [63]:
df_p=df_p.dropna(subset=['tickets'])

Since we dont need all the columns, we have selects a few of them.

In [91]:
df_t=df_p[["event_id", "event_name", "descriptions", "event_tags", "tickets", "place_id", "place", "start_ts", "end_ts"]]
In [92]:
df_t[0:5]
Out[92]:
event_id event_name descriptions event_tags tickets place_id place start_ts end_ts
0 477e919c-161f-9a69-5334-20060018d1a0 The Art Post Gallery Monthly Launch [{'type': 'default', 'description': 'Every mon... [visual art] [{'type': 'Standard', 'currency': 'GBP', 'min_... 3854b834-46f5-d79d-fbdc-87e500000065 {'place_id': '3854b834-46f5-d79d-fbdc-87e50000... 2022-02-05T19:00:00+00:00 2022-06-04T19:00:00+01:00
1 8654b892-a4f5-d79d-881e-43060018e8ed Chameleons & venue change [{'type': 'third-party', 'description': 'Anyon... [music] [{'type': 'Standard', 'currency': 'GBP', 'desc... 43468f54-e1f1-605a-3a6f-23250000d70b {'place_id': '43468f54-e1f1-605a-3a6f-23250000... 2022-02-05T19:00:00+00:00 2022-02-05T19:00:00+00:00
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... [{'type': 'Standard', 'currency': 'GBP', 'max_... 1754b890-c3f5-d79d-43e2-209500015ebc {'place_id': '1754b890-c3f5-d79d-43e2-20950001... 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... [{'type': 'Standard', 'currency': 'GBP', 'max_... 1754b890-c3f5-d79d-43e2-209500015ebc {'place_id': '1754b890-c3f5-d79d-43e2-20950001... 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... [{'type': 'Standard', 'currency': 'GBP', 'max_... 1754b890-c3f5-d79d-43e2-209500015ebc {'place_id': '1754b890-c3f5-d79d-43e2-20950001... 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00
In [93]:
df_t1=df_t.explode("tickets")

Now we are going to transform the max, and min prices of tickets to numeric values.

In [94]:
df_tickets=pd.concat([df_t1.drop(['tickets'], axis=1), df_t1['tickets'].apply(pd.Series)], axis=1)
df_tickets['min_price'] = pd.to_numeric(df_tickets['min_price'])
df_tickets['max_price'] = pd.to_numeric(df_tickets['max_price'])
df_tickets['min_price']= df_tickets['min_price'].fillna(0)
df_tickets['max_price']= df_tickets['max_price'].fillna(0)
In [95]:
df_tickets[0:5]
Out[95]:
event_id event_name descriptions event_tags place_id place start_ts end_ts type currency min_price description max_price
0 477e919c-161f-9a69-5334-20060018d1a0 The Art Post Gallery Monthly Launch [{'type': 'default', 'description': 'Every mon... [visual art] 3854b834-46f5-d79d-fbdc-87e500000065 {'place_id': '3854b834-46f5-d79d-fbdc-87e50000... 2022-02-05T19:00:00+00:00 2022-06-04T19:00:00+01:00 Standard GBP 0.00 NaN 0.00
1 8654b892-a4f5-d79d-881e-43060018e8ed Chameleons & venue change [{'type': 'third-party', 'description': 'Anyon... [music] 43468f54-e1f1-605a-3a6f-23250000d70b {'place_id': '43468f54-e1f1-605a-3a6f-23250000... 2022-02-05T19:00:00+00:00 2022-02-05T19:00:00+00:00 Standard GBP 0.00 tbc 0.00
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... 1754b890-c3f5-d79d-43e2-209500015ebc {'place_id': '1754b890-c3f5-d79d-43e2-20950001... 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00 Standard GBP 7.49 Up to three instant downloads or one preprinte... 7.99
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... 1754b890-c3f5-d79d-43e2-209500015ebc {'place_id': '1754b890-c3f5-d79d-43e2-20950001... 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00 Standard GBP 7.49 Up to three instant downloads or one preprinte... 7.99
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... 1754b890-c3f5-d79d-43e2-209500015ebc {'place_id': '1754b890-c3f5-d79d-43e2-20950001... 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00 Standard GBP 7.49 Up to three instant downloads or one preprinte... 7.99

Experiment 6.1: Getting the Frequency of Price Tickets

We are working just with max_price.

In [96]:
g_maxp=df_tickets.groupby(['max_price']).size().reset_index()
g_maxp=g_maxp.rename(columns={0: "number_of_times"})
#g_maxp=g_maxp.sort_values(by=['number_of_times'], ascending=False)
free_tickets=g_maxp[0:1]
## Removing FREE TICKETS
g_maxp=g_maxp.drop([0])
### 
g_maxp[:]
Out[96]:
max_price number_of_times
1 2.75 1
2 3.00 4
3 3.30 1
4 4.00 55
5 4.40 1
... ... ...
236 204.15 1
237 210.00 31
238 248.50 15
239 250.00 16
240 1610.00 2

240 rows × 2 columns

In [97]:
fig = px.line(g_maxp, x="max_price", y="number_of_times", title='Frequency of price tickets')
fig.show()
In [98]:
print("The number of free tickets is: %s" %free_tickets["number_of_times"].values[0])
The number of free tickets is: 12538

Experiment 6.2: Getting the frequency of type (Standard, Children) tickets

In [99]:
tickets_type=df_tickets.groupby(['type']).size().reset_index()
tickets_type=tickets_type.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
tickets_type
Out[99]:
type number_of_times
6 Standard 13908
1 Concession 1047
0 Children 586
7 Students 123
3 Family 67
4 Members 55
5 Seniors 43
2 Concessions 4
9 Zoom 2
8 Table of 10 1
In [100]:
px.histogram(tickets_type, x="type", y="number_of_times", histfunc="sum", color="type", title='Frequency of type tickets')

6.3 Exploring Performances Places

In [102]:
df_t1=df_t.explode("tickets")
df_tickets=pd.concat([df_t1.drop(['tickets'], axis=1), df_t1['tickets'].apply(pd.Series)], axis=1)
df_tickets['min_price'] = pd.to_numeric(df_tickets['min_price'])
df_tickets['max_price'] = pd.to_numeric(df_tickets['max_price'])
df_tickets['min_price']= df_tickets['min_price'].fillna(0)
df_tickets['max_price']= df_tickets['max_price'].fillna(0)
In [77]:
df_tickets[0:5]
Out[77]:
event_id event_name descriptions event_tags place_id start_ts end_ts type currency min_price description max_price
0 477e919c-161f-9a69-5334-20060018d1a0 The Art Post Gallery Monthly Launch [{'type': 'default', 'description': 'Every mon... [visual art] 3854b834-46f5-d79d-fbdc-87e500000065 2022-02-05T19:00:00+00:00 2022-06-04T19:00:00+01:00 Standard GBP 0.00 NaN 0.00
1 8654b892-a4f5-d79d-881e-43060018e8ed Chameleons & venue change [{'type': 'third-party', 'description': 'Anyon... [music] 43468f54-e1f1-605a-3a6f-23250000d70b 2022-02-05T19:00:00+00:00 2022-02-05T19:00:00+00:00 Standard GBP 0.00 tbc 0.00
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... 1754b890-c3f5-d79d-43e2-209500015ebc 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00 Standard GBP 7.49 Up to three instant downloads or one preprinte... 7.99
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... 1754b890-c3f5-d79d-43e2-209500015ebc 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00 Standard GBP 7.49 Up to three instant downloads or one preprinte... 7.99
2 1a6c0f5c-197d-5907-7e60-bac500134434 Curious About Cambridge [{'type': 'default', 'description': 'Self-guid... [activities, days out, history, kids, traditio... 1754b890-c3f5-d79d-43e2-209500015ebc 2022-01-20T10:30:00+00:00 2022-03-26T10:30:00+00:00 Standard GBP 7.49 Up to three instant downloads or one preprinte... 7.99

6.3.1 Frequency of Performances per Town

In [103]:
df_place=pd.concat([df_tickets.drop(['place'], axis=1), df_tickets['place'].apply(pd.Series)], axis=1)
In [104]:
df_town=df_place.dropna(subset=['town'])
town=df_town.groupby(['town']).size().reset_index()
town=town.rename(columns={0: "number_of_times"})
town=town.drop([0])
In [105]:
town=town.sort_values(by=['number_of_times'], ascending=False)
town
Out[105]:
town number_of_times
407 London 3552
427 Manchester 905
73 Birmingham 795
226 Edinburgh 472
266 Glasgow 469
... ... ...
324 Hornchurch 1
120 Burscough 1
319 Holywood 1
528 Portaferry 1
82 Blairgowrie 1

718 rows × 2 columns

In [106]:
px.scatter(town, x="town",y='number_of_times', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of Performances per Town")

6.3.2 Frequency of Type tickets per town

In [107]:
town_type=df_town.groupby(['town', 'type']).size().reset_index()
town_type=town_type.rename(columns={0: "number_of_times"})
town_type=town_type[town_type["town"]!=""]
In [108]:
town_type=town_type.sort_values(by=['number_of_times'], ascending=False)
town_type
Out[108]:
town type number_of_times
502 London Standard 3305
526 Manchester Standard 652
90 Birmingham Standard 543
318 Glasgow Standard 465
271 Edinburgh Standard 456
... ... ... ...
353 Hartlebury Children 1
642 Portaferry Standard 1
354 Hartlebury Standard 1
355 Hartlebury Students 1
433 Kilkenny City Standard 1

860 rows × 3 columns

In [109]:
fig = px.scatter(town_type, x='town', y='type', color='number_of_times', title="Frequency of type tickets per town")
fig.show()
In [110]:
px.scatter(town_type, x="town",y='type', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of performances type tickets per town")

6.3.3. Frequency of Max_Price tickets per towns

In [111]:
a=df_town[["town", "max_price"]]
a=a[a["town"]!=""]
town_price=a.groupby(['town', 'max_price']).size().reset_index()
town_price=town_price.rename(columns={0: "number_of_times"})
town_price=town_price.sort_values(by=['number_of_times'], ascending=False)
town_price
Out[111]:
town max_price number_of_times
757 London 0.0 2693
124 Birmingham 0.0 681
899 Manchester 0.0 535
432 Edinburgh 0.0 432
509 Glasgow 0.0 393
... ... ... ...
668 Kingston upon Thames 30.0 1
669 Kingston upon Thames 34.0 1
670 Kingswinford 0.0 1
671 Kinross 0.0 1
1427 York 55.7 1

1428 rows × 3 columns

6.3.3.1. Frequency of free tickets per town

In [112]:
free_town_price=town_price[town_price["max_price"]== 0.0]
free_town_price
Out[112]:
town max_price number_of_times
757 London 0.0 2693
124 Birmingham 0.0 681
899 Manchester 0.0 535
432 Edinburgh 0.0 432
509 Glasgow 0.0 393
... ... ... ...
674 Kirton in Lindsey 0.0 1
657 Kilkenny City 0.0 1
663 Kingsbury Episcopi 0.0 1
670 Kingswinford 0.0 1
671 Kinross 0.0 1

643 rows × 3 columns

In [113]:
fig = px.bar(free_town_price, x='town', y='number_of_times', color='number_of_times', barmode='group', title="Frequency of Free Tickets per Town")
fig.show()

6.3.3.1. Frequency of No free tickets per town

In [114]:
town_price=town_price[town_price["max_price"]!= 0.0]
town_price
Out[114]:
town max_price number_of_times
905 Manchester 7.5 202
907 Manchester 9.5 97
608 Horsham 8.0 51
607 Horsham 4.0 51
866 London 180.0 48
... ... ... ...
666 Kingston upon Thames 28.0 1
667 Kingston upon Thames 29.5 1
668 Kingston upon Thames 30.0 1
669 Kingston upon Thames 34.0 1
1427 York 55.7 1

785 rows × 3 columns

In [115]:
fig = px.bar(town_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Town")
fig.show()
In [116]:
town_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[116]:
max_price number_of_times
town
London 8007.13 859
Birmingham 1050.62 114
Edinburgh 635.47 40
Manchester 624.28 370
Glasgow 559.49 76
... ... ...
Stromness 5.00 1
Chepstow 5.00 1
Bellaghy 4.50 14
Clifton 4.00 2
Swansea 2.75 1

292 rows × 2 columns

6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews

6.4.1 Frequency of Price Tickets per Scottish City

In [117]:
scot_towns_price=town_price[town_price['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [118]:
scot_towns_price[0:10]
Out[118]:
town max_price number_of_times
512 Glasgow 7.99 14
4 Aberdeen 7.99 14
433 Edinburgh 7.99 14
16 Aberdeen 45.50 13
511 Glasgow 7.70 12
11 Aberdeen 23.00 10
521 Glasgow 66.50 8
524 Glasgow 77.50 8
446 Edinburgh 107.00 7
411 Dundee 7.00 7
In [119]:
fig = px.bar(scot_towns_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Scottish City")
fig.show()
In [120]:
scot_towns_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[120]:
max_price number_of_times
town
Edinburgh 635.47 40
Glasgow 559.49 76
Aberdeen 344.99 62
Dundee 68.00 27
Perth 45.00 4
Inverness 36.00 8

6.4.2 Frequency of Type Tickets per Scottish City

In [121]:
scot_towns_type=town_type[town_type['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [122]:
scot_towns_type[0:10]
Out[122]:
town type number_of_times
318 Glasgow Standard 465
271 Edinburgh Standard 456
8 Aberdeen Standard 165
257 Dundee Standard 129
415 Inverness Standard 37
624 Perth Standard 30
732 St Andrews Standard 28
255 Dundee Concession 19
270 Edinburgh Concession 15
7 Aberdeen Concession 12
In [123]:
fig = px.bar(scot_towns_type, x='town', y='number_of_times', color='type', barmode='group', title="Frequency of Type Tickets per Scottish City")
fig.show()
In [124]:
scot_towns_type.groupby(["town"]).sum()
Out[124]:
number_of_times
town
Aberdeen 184
Dundee 150
Edinburgh 472
Glasgow 469
Inverness 37
Perth 34
St Andrews 28
In [125]:
df_place.loc[0]
Out[125]:
event_id                     477e919c-161f-9a69-5334-20060018d1a0
event_name                    The Art Post Gallery Monthly Launch
descriptions    [{'type': 'default', 'description': 'Every mon...
event_tags                                           [visual art]
place_id                     3854b834-46f5-d79d-fbdc-87e500000065
start_ts                                2022-02-05T19:00:00+00:00
end_ts                                  2022-06-04T19:00:00+01:00
type                                                     Standard
currency                                                      GBP
min_price                                                     0.0
description                                                   NaN
max_price                                                     0.0
place_id                     3854b834-46f5-d79d-fbdc-87e500000065
list_id                                                       101
name                                                Online events
address                                                          
town                                                             
postal_code                                                      
images          [{'url': 'https://files.list.co.uk/images/2020...
lat                                                           NaN
lng                                                           NaN
Name: 0, dtype: object

6.4.3.3 Frequency of Schedules Dates per Event and per Scottish City

In [126]:
df_place2=df_place.dropna(subset=['town'])
df_place2
df_scott=df_place2[df_place2['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
df_scott=df_scott[["event_id", "event_name", "event_tags", "town", "start_ts", "end_ts"]]
df_scott[0:3]
Out[126]:
event_id event_name event_tags town start_ts end_ts
3 53580f5c-197d-5907-cb4f-25c50012519f Curious About Glasgow [activities, days out, history, kids, traditio... Glasgow 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00
3 53580f5c-197d-5907-cb4f-25c50012519f Curious About Glasgow [activities, days out, history, kids, traditio... Glasgow 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00
3 53580f5c-197d-5907-cb4f-25c50012519f Curious About Glasgow [activities, days out, history, kids, traditio... Glasgow 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00

Note: An event can have several schedules. And a schedule has an starting and end date. Therefore, an event can have several starting and end dates.

In [127]:
fig = px.scatter(df_scott, x='start_ts', y="event_name", title="Frequency of starting date per event in Scottish cities")
fig.show()
In [128]:
fig = px.scatter(df_scott, x='end_ts', y="event_name", title="Frequency of ending date per event in Scottish cities")
fig.show()

6.4.4 Grouping Schedules per Event and Scottish City

In [129]:
scott_schedule=df_scott.groupby(['event_name', 'town']).size().reset_index()
scott_schedule=scott_schedule.rename(columns={0: "number_of_times"})
scott_schedule=scott_schedule.sort_values(by=['number_of_times'], ascending=False)
scott_schedule
Out[129]:
event_name town number_of_times
382 Tasty Treats Afternoon Tea Aberdeen 32
383 Tasty Treats Afternoon Tea Dundee 32
112 Danny Kyle Open Stage Glasgow 28
439 The Typewriter Revolution Edinburgh 28
264 Michael Pendry's Les Colombes Edinburgh 28
... ... ... ...
170 Gnoss: The Light of the Moon and Mairi McGilli... Glasgow 1
169 George O'Hanlon Glasgow 1
168 Gearbox: Full Throttle - Scotland Glasgow 1
166 Gaelic Song Workshop with Katherine MacLeod Glasgow 1
468 musicALL: The Fridays Glasgow 1

469 rows × 3 columns

This means that the "Mercat Tours: Historic Underground" event has been scheduled 2944 times in Edinburgh

In [130]:
t=scott_schedule.groupby(["event_name"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[130]:
number_of_times
event_name
Tasty Treats Afternoon Tea 104
The Typewriter Revolution 28
Michael Pendry's Les Colombes 28
Danny Kyle Open Stage 28
Tapestry: Changing Concepts 24
... ...
Fèis Rois with Avanc 1
Funk Connection 1
Fucked up bingo 1
Front 242 1
musicALL: The Fridays 1

445 rows × 1 columns

This means that the "St Andrews Ghost Tours" event has been scheduled 1932 times

In [131]:
fig = px.bar(t, title="Frequency of Schedules per event")
fig.show()

6.4.5 Exploring Tags per Schedule and Scottish Cities.

In [132]:
a=df_scott.reset_index(drop=True)
tags_town=a[["event_tags", "town"]]
tags_town=tags_town.explode("event_tags")
tags_town
Out[132]:
event_tags town
0 activities Glasgow
0 days out Glasgow
0 history Glasgow
0 kids Glasgow
0 traditional & heritage Glasgow
... ... ...
1372 days out Perth
1372 food & drink Perth
1373 afternoon tea Perth
1373 days out Perth
1373 food & drink Perth

4617 rows × 2 columns

In [133]:
scott_tag=tags_town.groupby(['town', 'event_tags']).size().reset_index()
scott_tag=scott_tag.rename(columns={0: "number_of_times"})
scott_tag=scott_tag.sort_values(by=['number_of_times'], ascending=False)
scott_tag
Out[133]:
town event_tags number_of_times
293 Glasgow music 287
221 Edinburgh visual art 204
274 Glasgow folk 186
153 Edinburgh days out 171
244 Glasgow celtic connections 144
... ... ... ...
237 Glasgow blondie 1
103 Dundee pride 1
105 Dundee rock & pop 1
106 Dundee royal scottish national orchestra 1
93 Dundee lgbt 1

399 rows × 3 columns

This means that we have 52865 schedules tagged as Comedy in Edinburgh

In [134]:
fig=px.histogram(scott_tag, x="town", y="number_of_times", histfunc="sum", color="event_tags", title='Frequency of tags in Scottish Cities')
fig.update_layout(legend_traceorder="reversed")
fig.show()
In [135]:
t=scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[135]:
number_of_times
event_tags
music 434
days out 398
visual art 381
exhibition 242
folk 215
... ...
indie pop 1
hard trance 1
hard dance 1
grunge 1
world music 1

185 rows × 1 columns

This means that we have 52894 schedules tagged as Comedy in a Scottish city

6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh

In [136]:
edi_scott_tag=scott_tag[scott_tag['town'].isin(["Edinburgh"])]
edi_scott_tag
Out[136]:
town event_tags number_of_times
221 Edinburgh visual art 204
153 Edinburgh days out 171
162 Edinburgh exhibition 136
189 Edinburgh music 107
175 Edinburgh history 90
... ... ... ...
165 Edinburgh fantastic for families 1
146 Edinburgh conferences 1
158 Edinburgh dub 1
156 Edinburgh drag 1
200 Edinburgh rap 1

108 rows × 3 columns

In [137]:
edi_scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
Out[137]:
number_of_times
event_tags
visual art 204
days out 171
exhibition 136
music 107
history 90
... ...
metal 1
minimal techno 1
nature 1
paranormal investigation 1
world music 1

108 rows × 1 columns

In [138]:
fig = px.bar(edi_scott_tag, x='town', y='number_of_times', color='event_tags', barmode='group', title="Frequency of schedules tags for Edinburgh")
fig.show()

6.4.6 Histograms of starting/end schedules dates for Edinburgh

In [139]:
scott_start=df_scott.groupby([pd.to_datetime(df_scott['start_ts']), "town"]).size().reset_index()
scott_start=scott_start.rename(columns={0: "number_of_times"})
scott_start=scott_start.sort_values(by=['number_of_times'], ascending=False)
scott_start.reset_index()
Out[139]:
index start_ts town number_of_times
0 8 2022-01-20 10:00:00+00:00 Edinburgh 100
1 14 2022-01-20 11:00:00+00:00 Edinburgh 58
2 7 2022-01-20 10:00:00+00:00 Dundee 42
3 22 2022-01-21 10:00:00+00:00 Edinburgh 38
4 61 2022-01-24 14:00:00+00:00 Aberdeen 32
... ... ... ... ...
251 129 2022-01-29 10:00:00+00:00 Glasgow 1
252 127 2022-01-28 23:00:00+00:00 Edinburgh 1
253 125 2022-01-28 22:30:00+00:00 Aberdeen 1
254 124 2022-01-28 22:00:00+00:00 Edinburgh 1
255 255 2022-02-06 13:00:00+00:00 Glasgow 1

256 rows × 4 columns

In [140]:
ed_scott_start=scott_start[scott_start['town'].isin(["Edinburgh"])].reset_index()
ed_scott_start.groupby(["start_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_start, x='town', y='number_of_times', color='start_ts', barmode='group', title="Frequency of starting date schedules for Edinburgh")
#fig.show()
Out[140]:
index number_of_times
start_ts
2022-01-20 10:00:00+00:00 8 100
2022-01-20 11:00:00+00:00 14 58
2022-01-21 10:00:00+00:00 22 38
2022-01-22 17:30:00+00:00 38 28
2022-01-23 10:00:00+00:00 40 28
... ... ...
2022-01-29 11:00:00+00:00 133 1
2022-01-29 14:00:00+00:00 139 1
2022-01-29 22:30:00+00:00 156 1
2022-01-30 09:30:00+00:00 159 1
2022-02-06 10:00:00+00:00 253 1

82 rows × 2 columns

In [141]:
scott_end=df_scott.groupby([pd.to_datetime(df_scott['end_ts']), "town"]).size().reset_index()
scott_end=scott_end.rename(columns={0: "number_of_times"})
scott_end=scott_end.sort_values(by=['number_of_times'], ascending=False)
scott_end.reset_index()
ed_scott_end=scott_end[scott_end['town'].isin(["Edinburgh"])].reset_index()
ed_scott_end.groupby(["end_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_end, x='town', y='number_of_times', color='end_ts', barmode='group', title="Frequency of ending date schedules for Edinburgh")
#fig.show()
Out[141]:
index number_of_times
end_ts
2022-03-13 10:00:00+00:00 221 38
2022-02-06 10:00:00+00:00 188 28
2022-01-25 20:00:00+00:00 19 28
2022-09-11 10:00:00+01:00 275 28
2022-02-20 10:00:00+00:00 200 24
... ... ...
2022-01-29 10:00:00+00:00 59 1
2022-01-28 23:00:00+00:00 58 1
2022-01-28 22:00:00+00:00 56 1
2022-01-28 21:00:00+00:00 53 1
2022-04-02 15:00:00+01:00 236 1

96 rows × 2 columns

In [142]:
fig = px.histogram(ed_scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Edinburgh")
fig.show()
In [143]:
fig = px.histogram(scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Scottish Cities")
fig.show()
In [144]:
fig = px.histogram(scott_end, x='end_ts', y="number_of_times", title="Histogram of Schedules Ending Dates for Scottish Cities")
fig.show()
In [145]:
fig = px.histogram(scott_end, x="end_ts", y="number_of_times", histfunc="sum", title="Histogram on Date Axes")
fig.update_traces(xbins_size="M1")
fig.update_xaxes(showgrid=True, ticklabelmode="period", dtick="M1", tickformat="%b\n%Y")
fig.update_layout(bargap=0.1)
fig.add_trace(go.Scatter(mode="markers", x=scott_end["end_ts"], y=scott_end["number_of_times"], name="daily"))
fig.show()

6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time

In [146]:
b=df_scott.reset_index(drop=True)
tag_town_time=b[["event_tags", "town", "start_ts", "end_ts"]]
tag_town_time=tag_town_time.explode("event_tags")
tag_town_time
Out[146]:
event_tags town start_ts end_ts
0 activities Glasgow 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00
0 days out Glasgow 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00
0 history Glasgow 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00
0 kids Glasgow 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00
0 traditional & heritage Glasgow 2022-01-20T10:00:00+00:00 2022-03-26T10:00:00+00:00
... ... ... ... ...
1372 days out Perth 2022-01-24T12:00:00+00:00 2022-01-31T12:00:00+00:00
1372 food & drink Perth 2022-01-24T12:00:00+00:00 2022-01-31T12:00:00+00:00
1373 afternoon tea Perth 2022-01-24T12:00:00+00:00 2022-01-31T12:00:00+00:00
1373 days out Perth 2022-01-24T12:00:00+00:00 2022-01-31T12:00:00+00:00
1373 food & drink Perth 2022-01-24T12:00:00+00:00 2022-01-31T12:00:00+00:00

4617 rows × 4 columns

In [147]:
scott_tag_end=tag_town_time.groupby([pd.to_datetime(tag_town_time['end_ts']), "event_tags"]).size().reset_index()
scott_tag_end=scott_tag_end.rename(columns={0: "number_of_times"})
scott_tag_end=scott_tag_end.sort_values(by=['number_of_times'], ascending=False)


scott_tag_start=tag_town_time.groupby([pd.to_datetime(tag_town_time['start_ts']), "event_tags"]).size().reset_index()
scott_tag_start=scott_tag_start.rename(columns={0: "number_of_times"})
scott_tag_start=scott_tag_start.sort_values(by=['number_of_times'], ascending=False)
In [148]:
scott_tag_start
Out[148]:
start_ts event_tags number_of_times
49 2022-01-20 10:00:00+00:00 visual art 132
39 2022-01-20 10:00:00+00:00 exhibition 128
38 2022-01-20 10:00:00+00:00 days out 114
77 2022-01-20 11:00:00+00:00 visual art 74
41 2022-01-20 10:00:00+00:00 history 70
... ... ... ...
596 2022-01-29 22:00:00+00:00 hip hop 1
595 2022-01-29 22:00:00+00:00 hard house 1
594 2022-01-29 22:00:00+00:00 hard dance 1
593 2022-01-29 22:00:00+00:00 electronic 1
956 2022-02-06 13:00:00+00:00 music 1

957 rows × 3 columns

6.4.7.1 Frequency of schedules Starting Date in Scottish City

In [149]:
#fig = px.bar(scott_tag_start, x='event_tags', y='start_ts', color='number_of_times', barmode='group', title="Frequency of schedules tags per Scottish City")
#fig.show()

fig = px.scatter(scott_tag_start, x='start_ts', y='number_of_times', title="Frequency of schedules Starting Date in Scottish City.")
fig.show()

6.4.7.2 Frequency of schedules Ending Date in Scottish City

In [150]:
fig = px.scatter(scott_tag_end, x='end_ts', y='number_of_times', title="Frequency of schedules Ending Date in Scottish City.")
fig.show()

6.4.7.3 Scheduled tags and Starting Dates in Scottish City

In [151]:
fig = px.scatter(scott_tag_start, x='start_ts', y='event_tags', title="Scheduled Tags and Starting Dates in Scottish City.")
fig.show()

6.4.7.3 Scheduled Tags and Ending Dates in Scottish City

In [152]:
fig = px.scatter(scott_tag_end, x='end_ts', y='event_tags', title="Scheduled Tags and Ending Dates in Scottish City.")
fig.show()